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Abstract 

In many large multiple testing problems the hypotheses are divided into families. 
Given the data, families with evidence for true discoveries are selected, and hypotheses 
within them are tested. Neither controlling the error-rate in each family separately nor 
controlling the error-rate over all hypotheses together can assure that an error-rate is 



controlled in the selected families. We formulate this concern about selective inference 
in its generality, for a very wide class of error-rates and for any selection criterion, and 
present an adjustment of the testing level inside the selected families that retains the 
average error-rate over the selected families. 

Keywords: false discovery rate, family-wise error rate, hierarchical testing, multiple 
testing, selective inference. 
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1 Introduction 



In modern statistical challenges one is often presented with a (possibly) large set of 
large families of hypotheses. In fMRI research interest lies with the locations (voxels) 
of activation while a subject is involved in a certain cognitive task. The brain is divided 
into regions (either anatomic or functional), and the hypotheses regarding the locations 
in each region define a family (see, e.g., Benjamini and Heller, 2007, Pacifico et al., 
2004). Searching for differentially expressed genes, the genes are often divided into gene 
sets, defined by prior biological knowledge. Each gene set defines a family of hypotheses 
(see Subramanian et al., 2005, Heller et al., 2009). In the above examples, the families 
are clusters of units of interest: voxels or genes. Another problem having similar 
structure can be identified in multi- factor analysis of variance (AN OVA), where for each 
factor interest lies with the family of pairwise comparisons between the levels of that 
factor. Sometimes the set of hypotheses has a complex structure and can be divided 
into families in different ways. An example of such research is the voxelwise genome- 
wide association study, see Stein et al. (2010). In this study the relation between 
448,293 Single Nucleotyde Polymorphisms (SNPs) and volume change in 31,622 voxels 
(total of 448, 293 • 31, 622 hypotheses) is explored across 740 elderly subjects. We may 
view this problem as a family for each gene, or a family for each voxel. This example 
is considered in detail in Section SI 

Since in many of these cases the families are large, and we are in search for the few 
interesting significant findings in each family, it is essential to control for multiplicity 
within each family. Efron (2008) showed that controlling for multiplicity globally on 
the pooled set of hypotheses (ignoring the division of this set into families) may distort 
the inferences inside the families in both directions - the true discoveries may remain 
undiscovered in families rich in potential discoveries and too many false ones may be 
discovered in families rich in true null hypotheses. 

Many error criteria are in use to deal with the possible inflation of false discoveries 
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when facing multiplicity. In this work we address all error-rates that can be written 
as E{C) for some (random) measure of the errors performed C. These include the per- 
family error rate (PFER), where C = V the number of type I errors made; the family- 
wise error rate (FWER), where C = I^y^^y; the false discovery rate (FDR) where 
C = FDP the proportion of false discoveries (introduced in Benjamini and Hochberg, 
1995); the false-discovery exceedance, FDX, i.e. Pv{FDP > 7) = £'(I|^£)p>^}) for 
some pre-specified 7 (see van der Laan et al. (2004a) and Genovese and Wasserman, 
2006); and the generalized error rates, fc-FWER, i.e. Fr{V > k) = E{I^y^f,j) (see van 
der Laan et al., 2004, Lehmann and Romano, 2005) and /i;-FDR= E{FDP ■ l{v>k}) 
(introduced in Sarkar, 2007). However, there are error-rates which cannot be written 
in the form E{C) for some random variable C. For example, the Bayesian FDR (Fdr), 
proposed by Efron and Tibshirani (2002) is , and is not of the form E{C). The 
positive false discovery rate proposed by Storey (2003) is E(^\R > 0) = pr(^^^j and 
also cannot be written as E[C). See Farcomeni (2008) for a good review of multiple 
error criteria, the relationship between them and different multiple testing procedures. 

Suppose we control in each family of hypotheses separately a criterion at level q. Let 
Ci be the random value of C measuring the errors performed in family i, i = 1, . . . , m, 
so E{Ci) < q. It is trivial that we also control for E{C) on the average over all families 
as well, i.e. 

E(nk^)<g, (1) 

\ m J 

In many cases investigators tend to select promising families first, based on the 
data at hand, and then look for significant findings only within the selected families. 
In these cases it is common to control for E{C) in each selected family separately. 
Let us consider the widely used ANOVA with two or more factors as an example. 
The researcher first selects the significant factors, and then performs post-hoc tests 
(pairwise comparisons) within each selected factor. Usually the FWER is controlled 



4 



within each selected family of pairwise comparisons using Tukey's procedure, but in 
large problems the FDR has also been suggested for that purpose (see Williams et al., 



When considering only the selected families, we might wish to control the expected 
value of Ci in each selected family. Unfortunately, the goal of such conditional control 
for any combination of selection rule and testing procedure and for any configuration 
of true null hypotheses is impossible to achieve, as shown in the following example. 

Example 1.1. Suppose one of the families, say family i, consists only of true null 
hypotheses. Let E{C*) be an error-rate which reduces to FWER when all the null 
hypotheses are true (e.g. FWER, FDR, FDX). Let Proc* be an £'(C*)-controlling pro- 
cedure applied in each selected family at some level q' . Consider the following selection 
rule S: select the families where at least one rejection is made by Proc*. It is obvious 
that if family i is selected, there is at least one rejection in it, therefore Ci = In/->o} = 1- 
Hence, we have shown that for any i?(C)-controlling procedure used for testing the se- 
lected families, we can find a selection rule S such that E[C* \ i is selected by S) = 1. 

Therefore, we would like to achieve a more modest goal: the control of the expected 
average value of C over the selected families, where the average is if no family is 
selected. 

Formally, let Pj be the set of p-values belonging to family i, i = 1, . . . ,m. Let P 
be the ensemble of these sets: P = {Pi}^^. Let 5 be a selection procedure using as 
input the p-values P, identifying the indices of the selected families. Note that if the 
selection does not depend on P, the situation is similar to (1), so assuming S = S(P) 
is not a restriction. Define |5(P)|, the number of selected families. The error criterion 
that interests us is 



Let us first illustrate how the choice of C is refiected in the resulting error-rate. When 
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1999). 




C = Ijy^o}' this error measure is the expected proportion of families with at least one 
type I error out of all the selected families. In this case it is similar to OFDR defined 
in Heller et al. (2009) in the framework of microarray analysis. When C = FDP, 
the error measure in ([2]) becomes less stringent: it is the expected average FDP over 
the selected families. The difference between the average FDP and the proportion 
of families with at least one type I error may be very large. If three families are 
selected, with false discovery proportions equal to 0.04, 0.05 and 0.06 respectively, the 
average FDP is 0.05, whereas the proportion of families with at least one type I error 
is 1. The choice between these two error-rates should be guided by the application. 
If one can bear some false discoveries in the selected families as long as the average 
FDP over the selected families is small, the control of the expected average FDP 
may suffice. Alternatively, if one wishes to avoid even one false discovery in a selected 
family, control of the expected proportion of families with at least one type I error 
would be appropriate. 

Now we would like to illustrate the difference between controlling the expected 
average FDP over the selected families and controlling the FDR globally for the com- 
bined set of discoveries. Assume 40 families of hypotheses are selected. There are 36 
families with one rejection in each, and there are no false discoveries in these families. 
In each of the remaining 4 families there are 10 rejections, 5 out of which are false 
discoveries. Thus in 36 selected families FDP = 0, while in the remaining 4 families 
FDP = 0.5. The average FDP over the selected famihes is = 0.05. The total 

number of discoveries is 76, 20 out of which are false discoveries. Therefore the FDP 
for the combined set of discoveries is |^ = 0.26. This simple example suggests that 
control of the expected average FDP over the selected families does not imply control 
of the FDR for the combined set of discoveries. Let us now illustrate that controlling 
the FDR for the combined set of discoveries need not guarantee control of the expected 
average FDP over the selected families. Assume that there are 20 selected families with 
one erroneous and one correct rejection, whereas in each of the other 20 families there 

6 



are 18 rejections, all of which are correct. The total number of discoveries is 400, 20 out 
of which are false discoveries. Therefore, the FDP for the combined set of discoveries 
is = 0.05. However, the average FDP over the selected families is '^^^q'^ = 0.25. 

Controlling an error-rate on the average over the selected families may have im- 
portant advantages versus controlling it globally for the combined set of discoveries. 
It does give some level of confidence in the discoveries within each selected family - 
even if only on the average. In many applications controlling an error-rate on the av- 
erage over the selected families is simply a more appropriate measure of error for the 
interpretation of the results than controlling an error-rate globally for the combined 
set of discoveries. This important point is addressed in Section 3, where we discuss the 
structure of the families in view of the relevance of the control on the average over the 
selected, and illustrated with an application in Section 4. Even in the problems where 
no selection takes place, Efron (2008) argues that that one should obtain control of 
an error-rate in each family separately, implying control on the average (over all the 
families). When the selection takes place, control on the average (over the selected 
families) becomes even more important. Finally, in some cases power may be gained 
by controlling an error-rate on the average rather than globally for the combined set 
of discoveries, even though this is not the motivating reason for our emphasizing the 
control on the average over the selected. 

Control on the average over the selected is a manifestation of selective inference 
ideas developed in Benjamini and Yekutieli (2005). In that paper the authors made an 
important distinction between simultaneous and selective goals in inference on multiple 
parameters, in the context of confidence intervals (CIs) for the selected parameters. 
Simultaneous inference is relevant when the control of the probability that at least one 
CI does not cover its parameter is needed. As a result, the simultaneous control also 
holds for any selected subset. However, when CIs are built only for one set of selected 
parameters, the goal need not be that strict, and the authors suggest a more liberal 
property: the control of the expected proportion of parameters not covered by their CIs 
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among the selected parameters, where the proportion is if no parameter is selected 
(FCR). Setting the goal of selective inference for the testing of multiple families and 
adopting the error measure in ^ is analogous to FCR in the framework of building 
CIs for the selected parameters. 

Note that E{Cs) reduces to ([T]), when 5(P) = {l,...,m}, i.e. when there is 
practically no selection. Unfortunately, when the error measure is averaged over the 
selected families, the expectation is not controlled, as demonstrated in the following 
example. 

Example 1.2. A family of n hypotheses, for each of which we have a p-value, is 
selected if the minimum p- value in it is less than 0.05. Each selected family is tested 
using Bonferroni procedure at level a = 0.05. Further assume we have m such families, 
and all the null hypotheses are true (with uniformly distributed p- values). Obviously, 
the expected value of averaged I|y>i} is also controlled when the average is taken over 
all families. Let us demonstrate what happens to the average C = I|y>i} (average 
FWER hereafter) when taken only over the selected families, namely to E{Cs), over 
various values of m and n. In this case it can be explicitly determined (see Appendix 
1), and is given in the following table. 



m 


n 


E{\S{P)\/m) 




20 


100 


0.99 


0.049 


100 


20 


0.64 


0.076 


100 


10 


0.40 


0.122 


100 


2 


0.1 


0.506 



Table 1: Illustration of the selection bias in Example 11.21 There are m families with n 
hypotheses in each. All hypotheses are null. S(P) is the selected set of families, containing 
all families the minimum p- value in which is less than 0.05. Each selected family is tested 
using the Bonferroni procedure at level 0.05, assuring that E{Ci) = £'(I{y.>i}) < 0.05. Cs 
is the average number of families where at least one type 1 error was made. It can be seen 
that as the selection becomes more stringent, the selection bias is more severe. 



One can immediately observe from the last column that in this example the average 
FWER over the selected families can climb high and reach above 0.5, while with no 
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selection the level should be 0.05. It is also clear that the average FWER over the 
selected increases when the extent of selection (presented in the third column) becomes 
more extreme. Similar results were observed for average PFER {E{Cs) for C = V) 
rather than average FWER. In this particular example the extent of selection does not 
depend on the number of families m, but only on n, but this need not be the case for 
other selection rules. 

The main result of this paper is that in order to assure the control of E{Cs), we 
should control for E{Ci) in each selected family i at a more stringent level: the nominal 
level q should be multiplied by the proportion of the selected families among all the 
families. This result, under some limiting conditions, is the focus of Theorem 2.1. 
A general result of the same nature, covering more complicated selection rules, such 
as multiple comparisons procedures that make use of plug-in estimators, is given in 
Theorem 2.2. 

2 Selection adjusted testing of families 

When all the families are selected with probability 1, no adjustment to the testing 
levels should be done because the average over the selected families is the average 
over all. As the selection rule is more stringent and tends to select less families, the 
adjustment should be more severe. For clarity of exposition and enhancing intuition, we 
first introduce the adjustment for simple selection rules, first introduced by Benjamini 
and Yekutieli (2005) in the context of parameter selection, and only then turn to the 
general case. 

Definition 2.1. (Simple selection rule) A selection rule is called simple if for each 
selected family z, when the p- values not belonging to family i are fixed and the p- 
values inside family i can change as long as family i is selected, the number of selected 
families remains unchanged. 
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It is easy to see that many selection rules are indeed simple in the above sense. Any 
rule where a family is selected based only on its own p- values is a simple selection rule, 
as in Example 1.1. In Section [3] we show, that when the selection of the families is 
done using hypothesis testing, the widely used step-up and step-down multiple testing 
procedures provide simple selection rules, even though the decision whether a family is 
selected or not depends on the p-values belonging to other families as well. However, 
not all the selection rules are simple: examples are adaptive multiple testing procedures, 
as noted in Section [3l 

The following procedure offers the selection adjustment when the families are se- 
lected using a simple selection rule. 

Procedure 2.1 (Simple Selection- Adjusted Procedure). 

1. Apply the selection rule S to the ensemble of sets P, identifying the selected set of 
families 5(P). Let R be the number of selected families (i.e. R = |5(P)|). 

2. Apply £^(C)-controlling procedure in each selected family separately at level 

Rq 

m 

Theorem 2.1. If the p-values across the families are independent, then for any simple 
selection rule S{P), for any error-rate E{C) such that C takes values in a countable 
set, and for any E{C)- controlling procedure valid for the dependency structure inside 
each family, the Simple Selection- Adjusted Procedure guarantees E{Cs) < q- 

Remark 2.1. For all error-rates known to us, C is a count or a ratio of counts, so the 
condition on the values C takes is satisfied. 

Proof of Theorem \2.1\ The idea of the proof is similar to the proof of Theorem 1 in 
Benjamini and Yekutieli (2005). For each error criterion E{C), let C+ be the countable 
support of C. Since the selection rule is simple, we can define the following event on 
the space of all the p-values not belonging to family i: if family i is selected , k families 
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are selected including family i. Denote this event by . For any simple selection rule 

m m ^ 

^(^•5) = E E ^ E = ^ ^ s{p),c^^) (3) 

1=1 k=l c£C+ 

Note that Simple Selection- Adjusted Procedure does not reject any hypothesis in fam- 
ilies which are not selected. Therefore Cj = for each family i that is not selected. 
Hence, for this procedure we obtain 

^(c^) = EE^E-pr(c^ = -'^?^) 

i=l k=l c£C+ 
m m ^ 

= EE^E^pr(c^ = ^)pr(^?^) (4) 

i=l k=l c£C+ 

= EE^^(copr(^^?^) (5) 

i=l k=l 

Equality ^ follows from the independence between Pi and the set of p-values not 
belonging to family i, for each i = 1, . . . ,m. In expression for each k and i, Ci is 
the value of random variable C in family i, when a valid £'(C)-controlling procedure 
is applied at level ^ in each selected family. Since there are no rejections in families 
that are not selected, Ci takes the value there, so E{Ci) < ^ for each i = 1, . . . ,m. 
Now, using this inequality and the fact that YlT=i ^^i^k ^) — ^ ^o^' each i = 1, . . . , m, 
we obtain 

E E l^(C.)Pr(C«) < E E ^ S • = , (6) 

i=l k=l i=l k=l 

Results ([5]) and ([6|) complete the proof. □ 

Theorem 12.11 supplies the adjustment of the testing level in each selected family 
which is sufficient for the control of E[Cs) when the selection rule is simple. We will 
now show that in some special cases this adjustment is necessary, adopting Example 6 
in Benjamini and Yekutieli (2005) for our needs. 
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Example 2.1. Assume all the families are of equal size, n. All the hypotheses are 
null, all the p-values are jointly independent and uniformly distributed. Let us order 
the families by their minimal p-values. The simple selection rule is to choose the k 
families with the smallest minimal p-values. Assume that each selected family is tested 
using the Bonferroni procedure at level q'. In this case the average error-rate over the 
selected is 



k 

The families where at least one type I error is made are the families with the smallest 
minimal p-values. Therefore, if YliLi ^{Vi>i} ^ k, we obtain 

m 

iG-S(P) i=l 



Note that X^^^^ Vi ~ Binom ynn, , implying that E{Yl^i ^) = mq'. Hence, using 
Markov's inequality we obtain 

Y.Mv.>i}>kj<I>r[Y.V.>kj<^ (7) 

Note that q' < q, where q is the desired level of E{Cs) and is typically less than 0.05. 
Therefore, when ^ is not much larger than 1, (say fc = ^orfc = m — 3 where m is 
large), is very small and we can neglect it. Then, we obtain 



E (E.g5(P) ^ EiEZi Vi) _ mq' 
k k - 

and the adjustment q' = ^ is necessary for assuring that E{Cs) < q- 

The following procedure offers selection adjustment for any selection rule. This 
procedure reduces to Procedure [27T] when the selection rule is simple. 
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Procedure 2.2 (Selection- Adjusted Procedure). 

1. Apply the selection rule S to the ensemble of sets P, identifying the selected set of 
families 5(P). 

2. For each selected family i, i £ 'S(P), partition the ensemble of sets P into Pi (set 
of the p-values belonging to family i) and P^*^ (the ensemble of sets P without the set 
Pi) and find: 

Rmin(P^'^) := min{|5(P«,P, =p)| : i G 5(P«,Pi =p)}, (9) 
p - - 

the minimal number of selected families when family i is selected and the p-values for 
other families do not change. 

3. For each selected family i, apply i?(C)-controlling procedure at level 

m 

Theorem 2.2. // the p-values across the families are independent, then for any selec- 
tion rule S{P) and for any E[C)- controlling procedure valid for the dependency struc- 
ture inside each family, the Selection- Adjusted Procedure guarantees E[Cs) < q- 

The proof of Theorem 12.21 is given in Appendix B. 

Remark 2.2. We could guarantee that in each selected family at least one rejection is 
made by applying repeatedly the simple Selection- Adjusted Procedure, selecting each 
time the families where at least one rejection is made and adjusting the testing level at 
each iteration according to the proportion of selected families (out of all the families) 
at the previous iteration, until in each selected family at least one rejection is made. 

Interestingly, when each family consists only of one hypothesis and each selected 
hypothesis is rejected at level q if its p-value is less than this iterative application of 
the Simple Selection- Adjusted Procedure is equivalent to the Benjamini and Hochberg 
procedure (BH hereafter, see Benjamini and Hochberg, 1995) applied on the whole set 
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of p-values (provided that at the first iteration we select the hypotheses with p-values 
less than q). Obviously, the testing procedure applied in each selected family is an 
FWER-controlling procedure in this case, and the expected average value of I{y>i}. 
over the selected families is the expected proportion of type I errors out of all the 
selected hypotheses. Since it is guaranteed that each selected hypothesis is rejected, 
this is actually the FDR of the whole set of discoveries. 

3 Selection of the families via multiple hypoth- 
esis testing 

If the selected families are considered as scientific findings by themselves, a situation 
often encountered in large testing problems, it would be appropriate to address the 
erroneous selection of a family, and control some error-rate of the selection process, as, 
for example suggest Heller et al. (2009) for selecting gene sets in microarray analysis 
and Sun and Wei (2011) for analyzing time-course experiments. We may associate each 
family with its global null (intersection) hypothesis and use the inside-family p-values 
in order to construct a valid p- value for its intersection hypothesis. (See Loughin (2004) 
for a systematic comparison of combining functions that can be used for this purpose). 
Then we may apply a multiple testing procedure on these combined p-values and select 
the families for which the global null hypothesis is rejected. The choice of the multiple 
testing procedure should be guided by the error rate that we wish to control at the 
family level and the dependency among the combined p-values. 

Heller et al. (2009) address a similar problem of inference across families of hy- 
potheses in microarray analysis. They first select promising gene sets and then look 
for differentially expressed genes within these gene sets. They define an erroneous dis- 
covery of a set if a set is selected while no gene in the set is differentially expressed, or if 
a set is appropriately selected but one of the genes in the set is erroneously discovered. 
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They define the Overah FDR criterion (OFDR), as the expected proportion of "erro- 
neous" discoveries of gene sets out of all the selected gene sets. This error criterion is 
equivalent to E{Cs) for C = I{y>o} when it is guaranteed that in each family (gene 
set) at least one rejection is made. This condition is not always fulfilled. For example, 
when the signal in the family is weak, it may be possible to see evidence that that there 
is at least one signal in this family, but impossible to point out where this signal is. In 
these cases our criterion does not coincide with the OFDR. In order to see it, suppose 
an all-null family is selected, and there are no rejections inside this family. This family 
will have no contribution to C5, whereas it will have a contribution to the proportion 
of "erroneous" discoveries of gene sets out of all the selected gene sets, as defined by 
Heller et al. (2009). 

In Heller et al. (2009) the division of the hypotheses into families is determined 
by the problem. In many applications each hypothesis carries two "tags", that is 
the hypotheses have two-ways structure. The families can be constructed by pooling 
along either dimension. In these cases the researcher should define the families by the 
most important dimension for inference. In Section 4 we show an example of such an 
application. 

3.1 Simple and non-simple selection rules 

In Section 2 we defined what a simple selection rule is. It is obvious that any single-step 
multiple testing procedure satisfies this condition, since the cutoff for rejection does 
not depend on the other p-values. In addition, any step-up and step-down procedure 
defines a simple selection rule. See Appendix C for a proof. 

Important procedures which define selection rules that are not simple are the adap- 
tive FDR procedures (Benjamini and Hochberg, 2000, Storey et al., 2004, Benjamini 
et al., 2006, Blanchard and Roquain, 2009). Let us consider the two-stage procedure 
given in Benjamini et al. (2006). At the first stage, the hypotheses are tested using the 
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BH procedure at level q' = The estimator of the number of true null hypotheses 
is mo = m — R, where m is the total number of hypotheses and R is the number of 
rejected hypotheses at the first stage. If mo = 0, the procedure rejects all the hypothe- 
ses. Otherwise, the procedure rejects the hypotheses rejected by the BH procedure at 
level ^q'- The following shows that this procedure is not simple. 

Example 3.1. Assume there are 3 hypotheses, {//oi}i=i- Let Pi, P2, and P3 be the 
corresponding p- values. If Pi < |-, |- < P2 < and ^ < P3 < 3q', mo = 1 and 
all the hypotheses are rejected. Fix Pi and P3, and increase P2 so that ^ < P2 < q'- 
Now mo = 2, therefore H02 is still rejected, but the total number of rejections changes 
from 3 to 2. 

4 Associating SNPs with brain volume 

We would like to show the relevance of our approach to the voxelwise genome-wide 
association study performed by Stein et al. (2010). The authors explore the relation 
between each of 448293 Single Nucleotide Polymorphisms (SNPs) and each of 31622 
voxels of the entire brain across 740 elderly subjects, including subjects with Alzeimer's 
disease. Mild Cognitive Impairment, and healthy elderly controls from the Alzeimer's 
Disease Neuroimaging Initiative (ADNI). The phenotype of interest was the percentage 
volume difference relative to a sample specific template at each voxel, and a regression 
was conducted at each SNP with the phenotype as the dependent variable and the 
number of minor alleles, age and sex as the independent variables (assuming the addi- 
tive genetic model). In the original analysis for each voxel only the most significantly 
associated SNP was considered. Its p- value was "corrected" in order to obtain uniform 
distribution when no SNP is associated with that voxel. Then, the BH procedure was 
applied on the "corrected" p-values. Two were found at the 0.5 level, but the 5 top 
SNPs were selected for further research. This involved mapping the significance of the 
voxels per each one of these 5 SNPs. 
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Note, that actually the authors first divided the set of hypotheses into disjoint 
families, where each family was defined by a voxel, and the hypotheses within the 
family where the hypotheses on the association of each SNP with that voxel. The 
" corrected" p- value for each voxel was the p-value for testing the global null hypothesis 
for that voxel-family. Therefore, the authors selected the families, i.e. the voxels 
where evidence for at least one non-null association was obtained. Then, the authors 
considered the most associated SNPs within the selected voxels. So far, this analysis fits 
our framework. At the last step, though, they returned and defined the 5 SNP-families 
as their findings looking at the significance of all voxels within each SNP separately. 

We would suggest another partition of the hypotheses into families. As it can be 
understood from the paper, the authors are interested to find SNPs associated with 
regions in the brain, and be able to make maps of these regions. Therefore, it would be 
more appropriate to define each family as the set of all the association hypotheses for 
a specific SNP. This way, selection of the families would be equivalent to the selection 
of SNPs, which could be followed by finding the voxels associated with the selected 
SNPs. 

The next question is what error-rates should be controlled in this problem. It is 
obvious that the investigators do not wish to emphasize each voxel-SNP pair where an 
association is found, therefore there is no need to control for some error-rate globally, 
on the combined set of all the discovered pairs. The emphasis is on the selected SNPs 
and on the regions in the brain that could be affected by these SNPs. Therefore, it 
would be reasonable to (1) control for some error-rate when selecting the SNPs (2) for 
each SNP, control for some error-rate when selecting the voxels associated with that 
SNP, and (3) control for the error-rate in (2) on the average over the selected SNPs. 
The control on the average over the selected guarantees the adjustment for selection 
bias. The most common types of control in MRI analysis are FWER and FDR. For 
FWER control on the average guarantees that the expected proportion of SNPs where 
at least one voxel is erroneously declared associated out of all the selected SNPs is 
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bounded by a pre-specified number (say 0.05). The FDR control on the average is a 
more hberal property - it guarantees that the expected average over the selected SNPs 
of the proportion of erroneously discovered voxels per SNP is bounded. 

The control of some error-rate when selecting the SNPs could be achieved by defin- 
ing the global null p-values for each SNP and applying a multiple comparisons pro- 
cedure on these p-values, when the choice of the procedure should be guided by the 
desired error-rate for the selection of SNPs (see Section [3|). Theorems 12. II and 12.21 offer 
the methods to obtain the control within SNPs and on the average over the selected 
SNPs. According to these theorems, any commonly used method in MRI research could 
be applied across voxels for each SNP separately at the adjusted level: re-sampling or 
Random Field Theory approaches for the control of FWER, or the BH procedure for 
the control of FDR. Theorems 12.11 and 12.21 however assume independence across SNPs. 
This question is addressed theoretically in the next section. 

5 Average control under dependency across the 
families 

All the results given so far hold when the p-values across the families are independent. 
We will now consider the case where the set of all the p-values possesses the positive 
regression dependent on a subset (PRDS) property. 

First recall that a set in D in i?" is increasing (decreasing) ii x £ D and y > x 
{y < x) implies that y (z D. 

Definition 5.1. (Benjamini and Yekutieli, 2001). The vector X is PRDS on Iq if for 
any increasing set D (where x £ D and y > x implies that y £ D) and for each i £ Iq, 
P[X E D\Xi = x) is nondecreasing in x. 

In addition, we require that the selection rule be concordant, as defined in Benjamini 
and Yekutieh (2005). 
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Definition 5.2. (Benjamini and Yekutieli, 2005) A selection rule is concordant if for 
each i = 1, . . . ,m and k = 1, . . . ,m, {P^*-* : k < Rmin(P^^^)} is a decreasing set. 

It is easy to see that many selection rules are concordant. Both selecting each 
family where its minimum p-value is less than q, and selecting k families with the 
smallest minimal p-values are concordant selection rules. When the selection is made 
via hypothesis testing, any step-up or step-down procedure is concordant. 

Theorem 5.1. // the set of all the p-values is PRDS on the subset of p-values cor- 
responding to true null hypotheses, the selection rule is concordant, and the procedure 
used for testing each selected family is (1) Bonferroni procedure or (2) the BH proce- 
dure, then the Selection- Adjusted Procedure guarantees in case (1): 

I max(i?, 1) j ~ 

and in case (2): 

\ max(it, 1) / 
The proof is given in Appendix D. 

6 Discussion 

There have been very few works (outside Heller et al. (2009) discussed in Section 3) 
that address formally the issue of inference across families. We have mentioned Efron 
(2008) in the Introduction. Other works dealing with this issue are Hu et al. (2010) 
and Sun and Wei (2011). Neither of these last mentioned papers address the testing of 
multiple families of hypotheses within the framework of selective inference, which is the 
concern in our work. Testing each family separately while attending to some error-rate 
control within each tested family has an obvious advantage that the control is achieved 
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on the average across families. However, once only some families are selected based on 
the same data, and inference is made or reported only on the selected ones even this 
simple average error-rate across families deteriorates. In this note we pointed at this 
danger, formulated it, and offered simple - even if not optimal - ways to address it. 
Sometimes, the situation faced calls for more stringent control. This is the case when 
interest lies in assuring simultaneous control of the error-rate across families, and not 
merely on the average over the selected. 

Such a concern for simultaneity of inference across selected families can be formu- 
lated by £'(maxj=i^,,,^m Cj). For example, in the case Ci = I{q->^} this is the probability 
that in at least one family the false discovery proportion is greater than 7. It is easy to 
see that controlling E{Ci) at level ^ in each family guarantees the control of this error 
criterion. However, usually in applications the interest does not lie in all the families, 
but only in the promising ones. Therefore we address only the selective goal in this 
case. 

It may sometimes happen that there are no rejections in a selected family. For 
example, when the signal in the family is weak, it may be possible to see evidence that 
there is at least one signal in this family, but impossible to point out where the signal 
is. Some investigators may claim that in this case the interpretation of the results is 
not intuitive, therefore they wish to have at least one rejection in each selected family. 
This can be easily done by choosing appropriate selecting and testing procedures. It 
is easy to see that if the testing procedure used for selecting the families is a stepwise 
procedure with critical values less than or equal to and the global null p- value 

for each family is not less than its minimal adjusted p-value (where the adjustment is 
made according to the procedure used for testing the selected families) , it is guaranteed 
that in each selected family at least one rejection is made. 

In this paper we mainly addressed the goal of controlling some error-rate within 
each family and on the average over the selected families. Other types of error measures 
may be relevant as well. The investigator might wish to control for some error-rate on 
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the pooled set of discoveries across all the families. This seems to be the only concern in 
Efron (2008) and Hu et al. (2010). If the selected families are considered as scientific 
findings by themselves, a situation often encountered in large testing problems, it 
would be appropriate to address the erroneous selection of a family, and control some 
error-rate of the selection process, as, for example suggest Heller et al. (2009) in 
the context of microarray analysis and Sun and Wei (2011) in the context of time- 
course experiments. The investigator may be interested in more than one type of error 
measures. For example, one might wish to control for FDR within each gene set, on 
the average over the selected gene sets and globally on the pooled set of discovered 
genes across all the gene sets. Therefore, an interesting research direction could be 
development of the procedures controlling concurrently several error measures that are 
of interest to the investigator. 



7 Appendix 
7.1 Appendix A 

In Example 1, the formula for E[Cs) is the following: 



The proof is as follows. In this case 



m m 



E{Cs) = Y.Y. > 0' ^ € ^(P), C«) 



i=l k=l 



Family i is selected if its minimal p- value is less than q. Each selected family is tested 
using Bonferroni procedure at level q. Since all the null hypotheses are true, there is 
at least one type I error in family i if its minimal p- value is less than -. Therefore, 
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each family where at least one type I error is made is selected. Now we obtain 



i=l k=l 



m m ^ 



k 

=1 fc=i 



n \ 

5^ ~ ~ n) ) ^ 

i=l k=l 



m — 1 \ I / \n\ / \ n{m-k) 

1- l-(7 1- 



k - 1 



Let us define random variable Y ^ Bin(m — 1, 1 — (1 — <?)"). It is easy to see that 

E(C.) = m(l-(l-i)")i;(^). (11) 
Using Lemma 1 in Benjamini et al. (2006) we obtain: 

"^[y + iJ - m{l-{l-q)n)- ^'^> 



Substituting (fT2]l in (fTT|l we obtain the formula in (fTOl) . 

7.2 Appendix B 

Proof of Theorem \2.2\ 

Proof. For each error criterion E{C), let C+ be the support of random variable C. As 
in Benjamini and Yekutieli (2005), we define the following series of events: 

C('):={P»: i?^,„(P«) = A;} (13) 

According to the definition of i?mm(P*-*^) (see Q in Section 2), for each value of P*-*^ 
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and Pi=p, such that i G 5(pW,p), i?^i„(p(^)) < |5(pW,p)|. Therefore, 
E{Cs) = E' ^'^^(P)^^ 



max(|5(P)|,l) 

mm-. 

^ E E E = ^ ^ 5(p), c«) (14) 

i=l k=l C&C+ 

The expression in (|14p is identical to the expression in in the proof of Theorem 12.11 

u\ 

but the definition of here is different. In the proof of Theorem 12. II we use only the 
facts that the event C^*^ is defined on the space P^^^ and that for each i = 1, . . . ,m, 
TJk=i^^icf^) = 1- These facts remain true for the series of events defined in (|13p . 
Therefore, the arguments used in the proof of Theorem 12.11 after obtaining ([3]) can be 
applied here. □ 

7.3 Appendix C 

We will now prove that any step-up or step-down procedure defines a simple selection 
rule. Let 01,02, .. . , Om be the critical values of the given procedure. Let Hqi be a 
certain rejected hypothesis , and Pi be its p-value. We need to show that when all 
the p-values excluding Pi are fixed and Pi changes as long as H^i is rejected, the total 
number of rejections remains unchanged. 

Assume this is a step-up procedure. Let < ... < •••p|^„]^) be the ordered 
set of p-values excluding Pi. If this is a step-up procedure , and P)^^-!) — '^^^P{k) ^ 
Ofc+i, . . . , > ctm, the number of rejections is k for any value of Pi which guar- 
antees that H^i is rejected, i.e. Pi < Ok- 

Now assume that this is a step-down procedure. Let < p(2) < • • • < P{m) be 
the ordered set of p-values. Assume the number of rejections is k, thereby implying 
P{i) <ai,... < ak,P[k+i) > "fc+i- Since Hoi is rejected. Pi = p(^j^ for some j < k. 
Let us fix all the p-values excluding Pi and change the value of Pi so that H^i is still 
rejected. Assume p(i) < p(2) < • • • < P(m) is the ordered sequence of p-values after 
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the value of Pi is changed. Now Pj = Pq'), and since -ffoi is rejected, Ps < for each 
s < j'. If j' = j, it is obvious that the number of rejections remains unchanged. We 
win now deal separately with two cases: j' < j and j' > j. 

(1) Assume j' < j. Then j' < k, therefore it remains to show that < ag for 
s = f + l,...,k and P(k+i) > "fe+i- Note that = P{s-i) < "s-i < for 
s = j' + 1, . . . ,j. For s > j, = therefore now it is obvious that the number of 
rejections remains unchanged. 

(2) Assume j' > j. We will now show that j' < k. Assume j < k < j' . Note that for 
each j < s < f , pi^^) = P{s+i)- Particularly, pk = P{k+i) > ^k+i > afc, contradicting 
the rejection of Hqi. After we have proved that j' < k, the result follows immediately, 
since = for s> f. 

7.4 Appendix D 

Proof of Theorem \5.1\ 

The proof uses the techniques developed in Benjamini and Yekutieli (2001), (2005). 
The proof in case (2) is much more involved than in case (1). 

7.4.1 Proof for case (1) 

For each i = 1, m, let rrij be the number of hypotheses in family i and rriQi be the 
number of true null hypotheses in family i. Let Hoij and Pij, j = 1, . . . ,mj be the 
hypotheses and the p-values in family i, i = 1, . . . , m. We will use the series of events 

C« :={P«: i?,™„(P«) = fc} 
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We will prove the following 



..ag^l.l) ) -EgE^r.(..5,P),C«,P.,<^) 

i=ik=ij=i ^ 



i=lfc=l7 = l ^ \ 



mm,- 



3 

< FEF f c^i^^l^.- < ^1 ^ (16) 

■"^ k \ mm,- / mm,- 

m ^ niQi m 



m 



III ^ iiiOi III / \ 

,=1 * j=i fc=i \ 



Inequality in (jl5p is obtained by dropping the condition i € 5(P). Inequality in 
(jl6p is true since the p- values corresponding to true null hypotheses have a uniform (or 
stochastically larger) distribution. We will now prove that for any i = 1, . . . ,m, and 
j = l,...,?no 



fc = l ^ 

Since the selection rule is concordant, the set Dj^ = U^^^Cj , which can be written 
as : Rmin{P^^^) < + 1}, is an increasing set. The PRDS property on the 

subset of the p-values corresponding to the true null hypotheses implies that for any 
i = 1, . . . ,m, j = 1 . . . ,mo,: 



Pr ( Df\Pij < a) < Pr (Df\P,j < a' 



25 



for any a < a' . Now, we obtain for any k = 1, . . . ,m — 1: 

Pr ( «<"|P,, < ^) + Pr ( C^JPy < ^±±^) 

\ mmi ) \ ^ rami ) 

< Pr ( Z?«|P., < i^±^) + Pr ( cgjP., < ^±±^) 

= Pr(<,|F,<^^). (19) 

Applying repeatedly inequality pUj) for A; = l,...,m — 1, and using the fact that 
Cf = Df> we obtain 

f; Pr f C7« I P,, < < Pr f Z)» < < 1 (20) 

Using ([HD and (I20D we obtain 

e( ^^^=^^- ^<^yf^^<g. 

Vmax(|5(P)|,l)y - V"^^y " 

7.4.2 Proof for case (2) 

Let Pi be the set of p-values corresponding to family i. For each j = 1, . . . ,moi, let 
p/*''^ denote the set of the remaining mi — 1 p-values after dropping Pij. Let us define 
the following series of events on the range of P'>'^\ For family i, let B'^p[k] denote 
the event in which if H^ij is rejected by the BH procedure at level hypotheses 
(including .ffoij) are rejected alongside with it. We will use again the series of events 
, defined in (I13p in Appendix B. Using the fact that the p-values corresponding 
to the true null hypotheses have a uniform (or stochastically larger) distribution, we 
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obtain 



mm,- 



i=l fc=l r-i=l j=l 

< V V i V ^ V ^ • Pr f C« , 4-) [^] I P,- < ^ 

j=l fc=l ri=l * j=l * ^ 

q 



III ^ IILQi III llli / l, \ 

E^^EEEf--' ^f-4;"wi«.^£S 

1=1 j=l k=lri=l ^ 

Now the fact that moi < rrii reduces case (2) to the inequahty 

E E PM ' I < ^ ) < 1 (21) 

\ K J J mm,- / 

where i = 1, . . . , m, j = 1, . . . , moi- 

For each i = 1, . . . , m, t = 1, . . . mm^, let us define the group 

It = {(a, 6) : a e {1, . . . , m}, 6 € {1, ... , m^}, ab = t}. 

Obviously, It is a finite set. Note that 



nil y 

fc=lri=l ^ 



mm. 



t=i {fc,n)e/, 

Therefore, (1211) can be written in the form 



E E Pr(c«,4:^)[fc]|P.,-<^) <1 (22) 



*=1 (fc,r,)e/t 
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For each family i and its hypothesis Hqij, i = 1, . . . , m, j = 1, . . . , moj, let us define 



The key statement to prove ()22p is the following proposition. 

Proposition 7.1. For any s = 1, . . . , mrrii — 1, i = 1, . . . , m, j = 1, . . . , uiQi, one has 
the inequality 

\ mm,- / ^-^ \ 1-^ I J mm,- / 

<P.fA£)|P,-<i^) (23) 
V ""^^ mm, / 



Let us show that this proposition implies (j22p . Note that 



Pr ( ^S^^'Vij- < 



mm, , 

(fc,r,)G/i 



Now, a consequent application of (j23p with s = 1, . . . , mm^ — 1 leads to the inequality 



mrrii , x 



which implies (|22p . 



Proof of Proposition \7A\ Let us show that ^i*''^ is an increasing set. Let 



D[')4{P«:i2™,„(P«)<A: + l} 



and 



r [r\} mnii \^^+'') mmi ^' m 

where iPl?? < P^n? < . . . < Pv "'^ i n I is the ordered set of p- values in the range of 
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■ It is easy to see that 

^F=u u Kn^^M) 

t=i {k,n)(ilt 

Obviously, both and G^rP\k] are increasing sets. Unions and intersections of 

increasing sets are also an increasing set, hence A^s^'^ is an increasing set. The PRDS 
property on the subset of p-values corresponding to the true null hypotheses implies 
that for each i = 1 . . . , jn, j = 1, . . . , moi, and any a < a' 

Pr {AP\Pij < a) < Pr (A^i^'^\P,j < q') (24) 

It is obvious that for {k,r) / {k',r'), the sets (cj^^ f] Bn\k]) and (cj^) f] B^^ [k']^ 
are disjoint. Therefore for t / t', U(fc,.,)e,, (cf f] B^p [k]) and U(fc,.oeA, i^^k fl -'^ [k]) 
are disjoint as well. Now we obtain for any a 

Pr (4^^) I P,, < a) = ^ V,[ct\B^f[k]\P.,<a) (25) 

Using and we obtain that for any s = 1, . . . , mm^ — 1: 

and Proposition 17.11 follows. 
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